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Abstract 



A novel way of defining limits in classical statistics is proposed. This is 
a natural extension of the original Neyman's method, and has the desirable 
property that only information relevant to the problem is used in making sta- 
tistical inferences. The result is a strong restriction on the allowed confidence 
bands, excluding in full generality pathologies as empty confidence regions or 
unstable solutions. The method is completely general and directly applicable 
to all problems of limits. 
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I. INTRODUCTION 



The concept of Confidence Region for a parameter at a given Confidence Level is a 
center piece of classical statistics and was first introduced by Neyman |1J. It gives a definite 
meaning to the making of statistical inferences about the region where the value of an 
unknown parameter might fall, without any assumption on whether the parameter can be 
attributed some probability distribution and what it might be. An alternative approach 
to setting acceptance regions for a parameter is from Bayes, that on the opposite assumes 
and explicitly incorporates the information from a probability distribution of the parameter, 
which is supposed to be known before the measurement of the data set in hand, and it is 
therefore called "a priori" distribution. 

With regard to the choice between the two methods of statistical inference, the Author 
shares a common opinion that Bayesian methods are very useful whenever there is a solid 
ground for establishing the "a priori" parameter distribution, which this method readily 
exploits in optimal way, but the classical methods are the only reasonable choice whenever 
this does not happen. Unfortunately the measurements of physics quantities belong almost 
always to the second class. The widespread preference of the physicists for classical methods 
of setting acceptance regions seemed recently to weaken when it was realized that the usual 
procedures for setting limits in the classical framework can lead in some cases to highly 
counter-intuitive results. 

Several solutions to this unpleasant situation have been proposed, some of them requir- 
ing partial fallback on Bayesian concepts [|3],§, or even argued that the classical method 
was fundamentally weak, and could not work without the supplement of some Bayesian 
ingredient. 

Other authors ||f|], defended the classical point of view by proposing some alternative 
methods for setting limits that eliminate the unpleasant results while still adhering to Ney- 
man's prescription. The present work follows that same line of looking for meaningful results 
within the classical approach, avoiding any Bayesian contamination. However, I argue that 
none of the previous proposals is completely satisfying, and that a deeper revision of current 
ideas is needed in order to really solve the difficulties, yielding to very different conclusions 
from past work on the subject. 

It is worth noting that the insistence on classical methods should not be taken to imply 
that the Bayesian method are not very useful in the more limited field where they are 
unambiguously applicable. 

In Sec. [TI| a few examples of problematic limits are discussed, some of which appear not 
to have been previously considered. In Sec. [Ill A| I analyze the reasons for the physicist's 
dissatisfaction and what they reveal about the incompleteness of the classical CL definition 
by Neyman, and in sec. [D IB] I propose a general solution of these issues completely contained 
in the realm of classical statistics. In Sec. [IV] the most important features of the proposed 
approach are discussed, with brief notes on some specific examples. 

II. PROBLEMS WITH STANDARD CLASSICAL LIMITS 
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A. Definitions and notations 

Let p G M indicate some unknown parameters, and x G X a random variable we can 
observe, whose probability distribution p(x\p) (pdf for short) depends in some way on the 
unknowns p. Both p and x can be arbitrary objects, e.g. they can be vectors of real 
numbers of any length. When the observable is continuous a probability density rather 
that a discrete distribution is necessary to describe it, but for simplicity the same notation 
p{x\p) will be used, and the distinction will be explicitly noted only when necessary. In 
both cases p(x G S\p) will indicate the total probability for the observable to fall in a given 
subset S G X, independently on whether it is obtained by a sum (discrete variable) or an 
integration (continuous), or bothQ. 

Let B(x) be any function associating to each possible observed value of x a subset of 
values of p (B is intended to represent some algorithm to select "plausible" values of the 
unknown p on the basis of our observation). The classical definition of CL from Neyman can 
then be stated as follows: the function B ("confidence band") is said to have "Confidence 
Level" equal to CL if, whatever the value of p, the probability of obtaining a value of x such 
that p is included in the accepted region B(x) is (at least) CL. In short: 

CL{B) = inf p(p G B(x)\p) = 1 - supp(p g B(x)\p) (1) 

Obviously the Confidence Level is a property of the band B as a whole, not of a confidence 
region associated to a particular value of x: it is quite possible for two different algorithms B 
and B', to give the same confidence region for some x, and still have very different Confidence 
Levels. This is the reason for the need of always deciding the algorithm B before making 
the actual measurement, clearly implied by the original formulation, but apparently often 
forgotten, and only recently clearly pinpointed |J. 

Neyman's definition is so general, that after choosing the desired CL, there is a very 
wide variety of bands B satisfying it. In a generic case, confidence regions can be arbitrarily 
complicated subsets of the p space. One can even construct fractal confidence region if one 
likes to do so. 

For this reason, soon some "rules" have been invented to easily obtain simple confidence 
regions with desirable properties. Most of them are based on ordering all possible values 
of the observable x according to some rule, and then determining the confidence region by 
adding up in order as many values as needed for reaching the desired coverage, that is the 
integral of the pdf over the accepted region. Common examples of rules are upper/lower 
limits, based on ordering for increasing/decreasing value of x (assumed a number), "cen- 
tered" (for unidimensional x, order by decreasing tail probability, yields equal probabilities 
in the upper and lower excluded regions), and the band obtained by ordering for decreasing 
p{x\p) || ("narrowest band", or "Crow band" in the following). 

These rules really have nothing fundamental, but they have been so commonly used 
that they have been sometimes identified with the very essence of the CL concept. For 



x Note that p(x E {^}|m) = p{ x \p) f° r discrete variables, while p(x £ {x}|/x) = for continuous 
variables, independently of the value of the density p{x\u) at the point x. 
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this reason, when some examples were found that showed serious limitations of these rules, 
their failure has been sometimes perceived as a failure of classical statistics as a whole, and 
alternative solutions often looked for in Bayesian concepts. 

Obviously, other choices can be singled out within the huge space of classical solutions, 
to give satisfactory solutions to those cases. In order to overcome the limitations of the other 
methods, the new method of Likelihood Ratio (LR) orderingf] has been recently proposed . 
This amounts to order the observable values by decreasing where fi represents 

the maximum likelihood estimate of fi, given x. This method appears to have distinct 
advantages over the previous, and stirred great interest around this problem. However, it 
does have limitations, that have inspired some amendments @,[|. 

I will argue in the next subsection that the LR-ordering method and its modifications 
have pitfalls as serious as those of other methods they are intended to replace, and cannot 
therefore be considered a genuine solution. 

B. Specific examples 

I proceed now to examine some examples of problematic confidence bands. 

The pathologies encountered are essentially of two kinds. The first and more obvious is 
when the confidence region happens to be the empty set. I avoid to speak of "unphysical 
values" of the parameters because I find it a confusing terminology: in every problem the 
parameters can assume values inside some domain, determined by the nature of the problem. 
If that domain actually describes all conceivable values of parameters for which a p(x\fj,) 
exists, then there is no meaning in referring to hypothetical values outside that domain: 
they just do not exist as possible values for \x. On the other hand, if the formulation of 
the physical problem allows to attach a meaning to other values of the parameters, they 
should be taken into account from the start, and cannot be called "unphysical". Similar 
considerations apply to the expression "the maximum of the likelihood function lies outside 
the physical region": the expression usually really means that the maximum occurs on 
the border of the parameter space, which does not poses particular problems and certainly 
does not suggest arbitrary extrapolations of the likelihood function outside its domain of 
existence. 

The other possible pathology is to have "unreasonably small" confidence regions, that is 
actually just a softer version of the previous. It is less obvious to detect, but it should be 
clear that it is just as unacceptable from the physicist's point of view. Also, it is potentially 
more dangerous since the experiment result will superficially appear to convey a great deal 
of information. How do we know that a limit is too tight ? A possible symptom of this 
situation is when the limits become tighter with decreasing experiment sensitivity, as in the 
example of Poisson with background below. 



2 This is often referred to in the literature as "unified approach" due to its capability of producing 
a single band containing both "central" and "upper/lower" intervals, but that property is not of 
particular relevance in the present context, therefore the more explicit expression LR-ordering is 
adopted 
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1. Poisson with background: a sensitivity paradox 

Let us examine briefly this problem of Confidence Limits of great practical importance. 
The probability distribution is given by: 

ft („^ = e -0H*)(A!+9: (2) 

While the observed number of counts n can only be positive, the presence of a background 
b constraints the overall mean \i + b of the Poisson to be larger than b, and therefore creates 
the possibility of "negative fluctuations" in the form of occurrence of much less observed 
counts than the average level of background. The "usual" ordering rules mentioned in sec. 
11 A| readily produce empty confidence regions in that case. 

The LR-ordering prevents this, but its results are counter-intuitive and hard-to-interpret 
as well. 

The problem appears clearly when comparing the results of experiments observing the 
same number of counts, but affected by different levels of background. It is easy to see that 
with the LR method the upper limit on \x goes to zero for every n as b goes to infinity, so 
that a low fluctuation of the background entitles to claim a very stringent limit on the signal. 
This means that the limit can be much more stringent than in the case of zero observed 
events and zero background. This is clearly hard to accept. 

The modification proposed in 0] only softens this behavior, and in addition uses Bayesian 
concepts in its formulation, therefore the uncompromising classical physicist will not want 
to consider it. 

The absurdity of the result is best seen by looking at the case of zero observed events. 
This has been clearly pointed out in [|]. 

If there is no background, and one observes zero events, one knows that no signal event 
showed in the sample at hand, and one can deduce an upper limit on \i from this fact. If 
there is some level of background and one observes zero events, that implies two facts: 

• a) no signal event showed in the current sample 

• b) no background event showed in the current sample 

The two occurrences are statistically independent, by assumption of Poisson distribution, 
therefore they can be considered separately. Fact b) is totally uninteresting for what concerns 
the signal: our only help in making decisions about \i is fact a), which is exactly the same 
information we had in the case of no background. A sensible algorithm must therefore give 
the same upper limit on /x in the zero-count case, whatever the expected background. 

This failure is particularly important if one considers that this behavior stems from the 
same root as the other problem that the LR proposal is intended to cure. In fact, the 
problem can be summarized by saying that the low likelihood of occurrence of event b) 
"fools" the algorithm into making up a very narrow confidence region that has no basis in 
what we actually learned from the experiment. This is exactly the same mechanism that 
leads to empty regions with the older rules: the rarity of a set of results is taken as a good 
reason for rejecting values of the parameter even if it is uncorrelated with the value of that 
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parameter. This should make us dubious about the question of whether the approach of LR 
ordering really addresses the issue. 

This problem was not missed by the proponents of the method, who devote a section 
of their paper to it ||. They maintain that the concern for this problem is motivated by 
"a misplaced Bayesian interpretation of classical intervals", but nonetheless suggest that 
in this kind of cases the experimentalist should not publish just the limits, but also an 
additional quantity to represent the 'sensitivity' of the experiment. This however avoids the 
question of how to provide an interval that properly and completely represent the results of 
the measurement, including all information about the sensitivity, that is the question the 
present work tries to address. 

The modification of LR-ordering proposed in || to address this problem is based on 
Bayesian quantities, therefore the uncompromising classical physicist will not accept it. 
Furthermore, it only softens the problem rather than eliminating it. 

A nice classical solution to this dilemma has been presented in Ref . , based on explicitly 
eliminating the spurious information from the calculation of the coverage, while still ordering 
the observable values according to LR. The amount of background events in the sample is 
forced to be less than the total number of observed events. This modification removes 
the paradoxical behavior of the limits, and produces results which seem reasonable from all 
points of view, so the particular problem of the Poisson with background might be considered 
as solved. 

However, the above procedure appears to be ad hoc, and it is not clear how to apply it to 
different situations, like the other examples of this section. In addition to that, the example 
that follow will show an important weakness of that variant, and any other variant based 
on the LR ordering rule. 

2. Gaussian with positive mean 

This is another very important example: p{x\n) is gaussian, but the condition fi > 
holds. If one tries to apply the Crow band, which is the usual choice for the unbounded 
case, one gets empty confidence region for x < —1.96 at 95% CL. This does not happen if 
one uses the LR ordering rule, as extensively discussed in 0. 

This example makes a very good case for the LR method, but unfortunately it is easy 
to expose its instability. Consider a modification of the gaussian pdf obtained by adding 
a second, very narrow gaussian of the same height but negligible width and area. Let 
the second gaussian be centered at a different location, for instance /i 2 = — What is 
important is just that yU 2 — > — oo as \i — > 0. 

Intuitively, this is a very small change of the problem: it just means that in a negligible 
fraction of cases the measurement x will fall in a different, narrowly determined location. 
This is not so artificial an example as it may seem, since it is quite possible for an experi- 
mental apparatus to have rare occurrences of singular responses. 

How the confidence regions should change, according to common sense ? If the probabil- 
ity of this occurrence is very small (let's say <C 1 — CL) one would just ignore the possibility 
and quote the same confidence limits as before. One would therefore want from a sound 
algorithm to yield very similar bands to the unperturbed case. Unfortunately, this does 
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not happen with the LR ordering method: since the ordering is based on the value of the 
maximum of the pdf, the narrow peak of negligible physical meaning is capable of altering 
the ordering completely: the maximum of the Likelihood is now a constant for every value 
of x, and the resulting band goes back suddenly to something very similar to the old Crow 
band, that is just ordering by p{x\ji). For large negative deviations, the intervals are not 
exactly empty, but contain a tiny interval centered around the peak of the second gaussian. 
However, this hardly makes the result satisfying from a physicist's point of view. When 
observing a large negative deviation, it is much more likely that is comes from the tail of the 
main gaussian rather than from the "extremely rare" second gaussian, and one would like 
the confidence limits to reflect this fact. The response of the LR method that instead "com- 
pletely forgets" the main gaussian to focus on the secondary peak, no matter how narrow, 
appears as a crucial failure. From a practical point of view, this kind of instability of the 
solution means that the response of the apparatus must be known with infinite precision in 
order to be able to use the algorithm. 

Note that the problem is intrinsic to the ordering, therefore any modification of the 
method acting only on the coverage criteria, as the one proposed in for handling the 
Poisson case, will be plagued by the same problem. 

It is also worth reflecting on what happens if the second peak is not so narrow, but rather 
comparable to the main peak. In that situation, the LR algorithm might give a result which 
is not so violently in contrast with common sense. Yet, it is hard to avoid the suspect that 
also in that case the result will be, in some ill-defined way, not what a physicist wants. 

3. Empty confidence regions are not ruled out by the LR method 

The previous example showed a case where LR ordering yields negligibly narrow con- 
fidence regions. For completeness, it is worth noting that it is also possible to formulate 
examples where the LR-ordering produces completely empty confidence regions on wide 
ranges of the observable, contrary to what is generally assumed. 

This can be obtained, for instance, by adding to the pdf a narrow, wiggling ridge of ever 
increasing height, still of negligible area. For instance, in the previous example one might 
simply add to the pdf the function: 

eiV(a;o + 5sin(//), 7-7*—) 
1 + /! 

where N(m, a) stands for the Gaussian function with unit area, mean m and standard 
deviation a, and 5, a, and e are real numbers ( a and e are "small"). It is easy to see that 
the likelihood function for any x G [xq — 5, xq + 5} has periodic "spikes" with a height that 
increases without limit as — > 00, therefore the maximum likelihood is infinite, and the 
LR is zero for all x G [xq — 5, xq + 5} and all values of /1, including the points on the spikes 
themselves. As a consequence, all points in the interval x G [x — 5, x + 5} will get the lowest 
possible rank in the ordering, so they will be the last to be added to the accepted region, for 
all /i. If an interval is chosen in such a way that p(x G [x — 5, x + S]\fi) < 1 — CI (which is 
always possible whatever the pdf), then the confidence region will be the empty set for all 

X G [xq — 5, Xq + 5} . 
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The example is clearly very artificial, but is nonetheless valuable in signaling the existence 
of a problem. 

4- Uniform distribution 

An example which is simpler than the previous and totally plausible in practice, yet 
presents unexpected difficulties is the uniform distribution: 

p(x\p) = f if /!<£</! + 1, otherwise 0. (3) 

Let's consider the case of the domain of /1 being the full set of real numbers. The 
upper /lower limits presents no trouble in this case, but both the Crow band and the LR 
band are indeterminate, since every value of x gets assigned the same rank, for whatever 
fi, therefore any band satisfying Neyman's condition will satisfy both. In particular, note 
that LR-ordering does not exclude empty confidence intervals in this example. Again this 
indicates that the root of the difficulties that motivated this approach has not, in fact, been 
eliminated. 

Anyway, here we are again confronted with instability of the solution: a very small 
perturbation of this pdf, obtained by adding an arbitrary "infinitesimal" function with zero 
total integral will resolve the ambiguity in a way which depends completely on the exact form 
of the perturbation, however small its size. In this case it is not even necessary to consider 
narrow spikes as in previous examples: the instability can be obtained with perfectly smooth 
and slow-varying functions. 

Also, there is no obvious way to extend to this case the modifications suggested in Q 
for the Poisson with background example. 

5. Indifferent distributions 

In order to better illuminate the nature of the problem that is frustrating the attempts 
at obtaining sound classical limits it is useful to examine a "trivial" example: a probability 
distribution that does not depend on the value of /i: 

p(x\n) =p{x) (4) 

For simplicity, consider the specific case of a distribution of a discrete observable with 
just two values ('A' and 'B') depending on a parameter with just two possible values ('P' 
and 'Q'), given by the following table: 





P 


Q 


A 


0.95 


0.95 


B 


0.05 


0.05 



(5) 



Clearly in this case the observable is not providing any information on the parameter. 
What is a "sensible" band in this case? Obviously no conclusion can be drawn, so it should 



8 



be clear that the only acceptable band is the one that includes the whole table. On the 
other end, most rules will yield an empty region in case 'B'. 

The LR is constant everywhere, so the LR-ordering allows you to choose any Neyman 
band. In force of the economical principle that unneeded overcoverage is to be avoided, the 
best solution appears to be the band that covers only the upper row of the table, and leaves 
an empty region for case 'B', just as the Crow rule. 

In principle, nothing forbids to even choose arbitrarily to reject one of the two values 'P' 
and 'Q' and keep the other in the case 'B' is observed, thus accepting some overcoverage. 
That choice is very unreasonable from a physicist's viewpoint: it means one can conclude 
essentially anything from the occurrence of event 'B'. For instance, when investigating the 
neutrino mass, one can make an "experiment" by doing something completely unrelated, for 
instance, by throwing a pair of dice. Since the probability of getting, say, 6 on both dice is 
< 3%, if that event actually occurs, one is entitled to exclude a mass range of his choice at 
97% CL. I think very few persons would accept this as a sensible inference, yet the procedure 
is perfectly correct from the point of view of Neyman's definition, and is compatible with 
LR-ordering, too. 

Here the criteria of coverage shows clearly its inadequacy: to obtain a sensible answer it 
is not enough that no more than 5% of the outcomes are excluded for every /x, it would also 
be necessary to make sure in some way that the choice one makes is not based on information 
irrelevant for distinguishing different values of the parameter. 

It should be clear at this point that this is the fundamental weakness of Neyman's 
definition ([[]), from which all problems arise. As for the LR ordering rule, it appears to be 
going somehow in the right direction, but it is unable to provide a clear-cut answer to a 
simple problem like this. 

Things get even worse if a small perturbation of the indifferent band is introduced, 
leading to the following situation: 





P 


Q 


A 


0.95 + e 


0.95 -e 


B 


0.05 - e 


0.05 + e 



Common sense clearly suggests not to draw any conclusion in this case, too (not at 95% 
CL, at least). 

The LR method instead provides now unambiguously the answer of a confidence region 
covering all but the lower left cell. This means, no conclusion is drawn from observing event 
'A', but 'P' is excluded if event 'B' is observed. 

Admittedly, now 'Q' is the maximum Likelihood estimation of the parameter , but the 
difference with the previous case of "crazy inferences" is infinitesimal. When we claim that 
the conclusion has 95% CL, what meaning can we attach to this number if, however small 
the difference, the CL is always 95% ? It looks like a too strong statement for an infinitesimal 
difference between the two hypothesis. Note that the band obtained for this case is exactly 
the same that would have been obtained from the following distribution, at the same CL: 
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P 


Q 


A 


0.95 + 


e 


0.05 + e 


B 


0.05- 


e 


0.95 - e 



yet the two situations are intuitively very different in terms of sensitivity of the experiment 
to the value of the parameter. 

This example sheds serious doubts on the meaningfulness of valutations of the sensitivity 
of a designed experiment based on expected confidence limits calculated with any current 
rule. Again, this is a very serious inconvenient, and the failure in handling a so simple ex- 
ample should make us suspicious of many other bands, or maybe of all Neyman's confidence 
bands. 

III. PROPOSAL OF A CLASSICAL SOLUTION 
A. Nature of the problem 

All proposed classical rules for building confidence bands meet with severe difficulties 
when confronted even with simple problems. This is true also for the recently proposed 
LR-ordering which appears to do only slightly better that older recipes. 

It is worth noting that the characteristics of the most common pdf 's taken as example 
of the difficulties (first two of previous section) has often lead to speak of a "problem of 
bounded regions" or of "small signals" . However, the additional examples provided should 
be sufficient to clarify that the presence of a boundary, or the smallness of the number of 
counts are just accidents without connection to the root of the problem. 

One should ask what is the exact reason for considering the previous examples of confi- 
dence limits unacceptable. Their results are obviously mathematically correct. The problem 
is not of 'mathematical' or 'statistical' but of 'physics' nature: one is lead to setting confi- 
dence limits which are intuitively 'unpleasant' to the physicist, sometimes even paradoxical. 
We don't want to accept a result like and empty confidence region, which we know is false 
no matter what \i is, because we feel we could do better inferences by keeping that fact into 
account somehow. Indeed a result of this kind does not convey much useful information to 
the reader. The same can be said for the softer pathology of "unreasonably small" confidence 
regions. 

It is hard to avoid the suspect that problems of the kind exemplified above might be 
happening even in other cases that we usually regard as problem-free, just because the 
problem is not so apparent to intuition. 

Each of the encountered problems lies in the choice of a particular confidence band, and 
in principle can be cured by simply choosing a different band. However, one cannot content 
oneself with avoiding the problem case by case by rejecting unreasonable results "by hands" . 
The above described weakness are so important to undermine the physicist's belief in the 
meaning of CL. It is therefore necessary to find a general way to avoid any such "unwanted" 
conclusion, even in possibly softer, hidden forms. 
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The question is: can we state precisely what properties we require from a confidence 
band to call it 'physically sensible' ? Does a single well-defined procedure exist to construct 
one in a generic case ? 

Let us look in more detail at the meaning of confidence limits. 

The Neyman's definition of CL can be phrased in the following way: 

"An algorithm is said to have Confidence Level CL if it provides correct answers with 
probability at least CL, whatever the value of (i (or whatever its probability distribution, if 
it has one)" 

From a practical point of view, this means that if one considers, for instance, the set of 
all published limits at 95% CL, the expected fraction of them which is indeed "wrong" (that, 
is, the limits do not include the true value) is 5% (or smaller, if there is some over cover age). 

For comparison, the definition of Bayesian credibility level, when phrased in a similar 
way, sounds like: "An algorithm is said to have a Bayesian credibility level BL if all answers 
it produces have at least a probability BL of being correct, provided /x has the (known) 
probability distribution vr(/i)". In exchange for an additional assumption (the a-priori dis- 
tribution) the Bayesian method provides a probability statement about each measurement. 

The classical approach cannot possibly do that, since the concept of probability for a 
single result of being correct simply cannot be formulated in the classical language: each 
particular result is either true or false, since the unknown parameter is taken to have one 
(if unknown) value, rather than a distribution of possible values. Superficially, however, the 
classical method seems to provide a close performance, when saying that the whole set of 
results contains only a limited fraction of wrong results (< 1 — CL). 

There is, however, a subtle difference between a statement extracted from a sample con- 
taining 95% correct statements, and a statement that has a 95% probability of being true. 
The difference is that in the first case some manifestly false or very unlikely statement are 
allowed to be part of the set (e.g. , empty confidence regions), provided they are a minority, 
while in the second case this is not possible: every single possible Bayesian inferred result 
is forced to be as likely as all others. 

This is the fundamental reason for the absence of pathological conclusion in the Bayesian 
approach, that keeps tempting the classical physicist. Its appeal is so strong that even the 
purest classical papers show some slight inclination toward Bayesianism. 

As an example, the method suggested in B for evaluating an experiment sensitivity 
uses the concept of "average limit" , that in general requires a a-priori distribution of the 
parameter to be assumed, even if the paper only consider the special case of no signal for that 
purpose. In j|, after a nice classical suggestion for solving the Poisson problem classically, 
the results are compared to Bayesian results and their similarity is taken as a support to 
their soundness, notwithstanding the fact that if one had to change the a-priori distribution 
to something different the Bayesian result will change completely, while the classical result 
will always stay the same. 

It becomes therefore imperative to ask the question: is there any way to give to the 
classical method the same solidity without introducing any Bayesian element ? If there is 
none, than it may be simpler to abandon the classical method completely and use Bayesian 
concepts instead. 

The purpose of this paper is to suggest that there is indeed a way to obtain the desired 
properties in the classical framework. 
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The definition of CL ensures that the result will be correct at least a fraction CL of 
the cases. An empty region is never a correct conclusion, because \i has some value by 
hypothesis. The definition of CL is not meant to prevent wrong conclusions: it just makes 
sure that they happen rarely, the only limitation to empty CR being that it must not occur 
with probability greater than 1-CL, whatever the value of \x. In fact, it is easy to see that, 
given a set of values of x that has total probability < 1 — CL for any /x, it is always possible 
to assign the empty set as confidence region for all x in the set, provided the rest of the 
band is properly adjusted. 

This fact may even appear as a kind of inescapable "law" of classical statistics. After all, 
in formal logic one has that given a contradictory (impossible) assumption one can rigorously 
derive any statement. We might have to accept a kind of probabilistic analogue as well, that 
is, from the occurrence of an unlikely event one can statistically infer any statement. 

However, this is actually not the case. What disturbs the physicist is not the mere 
possibility of getting wrong results, which he obviously has to accept, but that one might 
get a wrong result and know it. One could say that those are "unlucky" experimental results. 
But there are good reasons to refuse to surrender to the occurrence of "unlucky" results: 
common sense suggests that once we get a result that we know to be uncommon, there 
should be a way to correctly account for its rarity, rather than getting confused by it. 

In a way, what we really need is to make sure that all experimental outcomes get uniform 
treatment, like in the Bayesian method. 

In this respect, in is worth noting that the strength of the definition of CL lies in its 
invariance for any transformation of the space of parameters //, even non-continuous. That is, 
all points of the parameters space get the same treatment, the metric and even the topology 
of the parameter space being irrelevant. We could even say that this is the essence of the 
classical statistics. This is to be contrasted with the Bayesian approach, where the a-priori 
distribution sets a well-defined metric in the parameter space. Then, why it happens that 
the classical methods seem to be so much worse than the Bayesian in assuring invariance in 
the outcome space ? 

As a matter of fact, Neyman's definition of CL (|l|) is symmetric for all values of x. 
However, most rules for constructing confidence bands break this symmetry: it is easy to 
see that by performing a change of variable in x one obtains different bandsQ This allows 
the mere fact that a particular experimental outcome is unlikely for some parameter value 
to be used to exclude that value, regardless to the fact that the outcome might be unlikely 
for entirely different reasons than the value of the parameter being sought. That probability 
might be low for every value of the parameters, so the exclusion of that particular value of 
/j is taken on the basis of irrelevant information. Neyman's construction, while compatible 
with total symmetry in x, does not explicitly enforces it, because it applies independently to 
each value of /a, and there is no way to tell whether the distribution of x has any dependence 
on the value of fi. 

We need therefore to find a way to prevent the introduction of information irrelevant 
to the determination of the parameters in the choice of the confidence band. It should 



'LR ordering is an important exception. See sec. IV for a discussion of this point 
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be intuitively clear that there is a connection between the contamination from irrelevant 
information and the unequal treatment of various possible experimental outcomes that is 
the basis of paradoxical results. 

The present approach is in some way the opposite of the attempts to improve the classi- 
cal method by the addition of Bayesian elements: it goes in the direction of an even stricter 
classical orthodoxy. The use of any metric or topological property of the x space is regarded 
as an "a priori bias" producing unequal treatment of some values. That is a kind of con- 
tamination from "Bayesianism" that needs to be eradicated from a pure classical method, 
which ought to use only the information contained in the pdf. 

B. A stronger concept of Confidence 

We formalize the request that the choice of Confidence Regions must not be based on 
irrelevant information in the following requirement. 

Suppose we take a subset of x values and rescale all likelihoods p(x\p) by the same 
arbitrary factor, (we have to re-normalize the pdf for the rest of x values after that, of 
course). A physically sensible rule for constructing confidence bands must be invariant under 
this kind of transformation, since the overall absolute level of probability of the events x does 
not affect the information that can be obtained on p. More precisely, we want to restrict the 
set of all possible confidence bands to a subset that satisfies the following property, which 
will be called local scale invariance: 

DEFINITION - Let x G X be an observable and p G M a parameter. Let 1Z be a rule for 
selecting confidence bands, that is, a function that associates to each possible distribution 
p(x\p) a set of Neyman confidence bands with a given CL. We say that 1Z is a locally scale- 
invariant rule if for any two pdf's p(x\p) and p'(x\p) such that p'(x\p) = c ■ p{x\jj) for all 
p G M and for all x G x C X (with c positive constant), and for every confidence band 
B G TZ(p), there exist a band B' G 1Z(p') such that B(x) = B'(x) for every x G X- 

This requirement is simple, general, and intuitively satisfying: it says that whatever 
algorithm we want to use to choose a CR for a certain set of possible observations, it must 
not be influenced by anything else than the dependence on p of the probability of the 
observations in question. Note that both the observable and the parameter space can be 
completely generic sets. We keep requiring all bands to comply with Neyman's condition, 
which however does not by itself guarantee the above property, neither it does any of the 
proposed algorithms for producing confidence bands, including the LR-ordering. The latter 
appears clearly from our previous discussion of the example of Poisson with background. 

It is interesting to observe that the rank assigned to x by the LR-ordering rule is indeed 
invariant under the above transformation, but the coverage criteria used to decide when to 
stop adding values of x to the acceptance region is not. The normalization constant creates 
the difficulty here, since one can have a region rejected in one case, that cannot be rejected 
in the other because its contribution to the total integral may be too large. 

We will now show that this seemingly weak requirement is actually very stringent in 
determining the set of allowed confidence bands, and that it can be turned into a well 
definite procedure for constructing bands. 

This is seen from the following theorem. 
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THEOREM - The largest set of locally scale-invariant bands coincides with the set of 
bands satisfying the following requirement: 
For every [l G M and every x C X: 

pjx G x,»&B(x)\[m) <1 _ cl /gx 

sup M p(x e x\p) 

whenever the denominator is non-zero. 
PROOF: 

Part 1 - All bands in a locally scale-invariant set satisfy condition (||). 
Suppose (H) does not hold. Then there is a band B, a subset \ an d a parameter value \x 
such that 

p(x G x,fi £ B{x)\p) 



sup M p(x G x\v) 
Then consider a new pdf defined inside \ by: 

p(x\n) 



>1-CL (7) 



p'(x\n) 



sup^ G 

and arbitrarily extended outside \. This is always possible since by construction 
f x p'(x\n)dx < 1 for every jj,. 
Obviously, for every /x: 

p'(/i & B(x)\fi) > p'(x ex,^ B(x)\fj) 

And from (^): 

// / t-j / \ | \ p(x G X, fi £ B(x)\(jl) . „ r 

p'{xex,V<£B(x )^) = - V ^ i > 1-CX 

sup M p(z G xlW 

then 

p'(fig B(x)\fi) >1-CL 

which contradicts Neyman's condition. Therefore B could not be part of an invariant set, 
in contradiction with the hypothesis. Therefore eq. @ is proved. 

Part 2 - The set of all bands satisfying (|5p is a locally scale-invariant rule. 

First of all, note that implies Neyman's condition as a special case (just take \ = X). 

Take any p, 5 j G I, c> 0, and p' = c ■ p for all x G x- Note that the ratio in @ 
does not change when the pdf is scaled by a constant, so if B satisfies (0) for p in x an d all 
its subsets it will also satisfy it for p'. Let's define B' = B in x an d B' = M (the whole 
parameter space) outside x- Then, for any (Clwe have: 

p'(xe£,ix?B'(x)\ix) 



< 



sup M p'(x G xIa*) 

p'(jg((nx),^B'(a!)|/i) 

sup^ p'(x G 
p'(z G (£nx),M 



< 



sup^ p'(x G (£Dx)|m) 
p(xG (^nx),M^ 



sup„ p(x G (f nx)|/x) 



< 1-CL 
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This means B' satisfies for the distribution p'. Since we defined B' = B in \, this proves 
local scale-invariance of the set of bands given by (@). 

Part 1 and 2 together show that the two sets coincide, concluding the proof. Note that 
they implicitly prove that the "largest set of locally scale-invariant bands" indeed exists, 
which was not granted a prior^. 

Condition (||) is clearly connected with the intuitive concept of uniform treatment of 
all experimental results, and offers a much clearer indication than the equivalent scale- 
invariance requirement about how to construct in practice a satisfying confidence band. 

It also appears as a natural extension of Neyman's CL concept, because it amounts to 
simply applying at local level the same requirement Neyman imposed on the observable 
space as a whole. 

This fact suggest an alternative formulation: rather than regarding the (|6|) as a rule for 
identifying a particular subset of confidence bands, we can take this condition as a new, more 
restrictive, definition of limits within the classical framework ( "Strong Confidence Limits" ) 
and define a new quantity ("strong CL", or "sCL") in analogy with the usual CL (eq. ([[])): 

sCL(B) = 1 - sup P(*ex,l* tB(x)M 

m,x sup M p[x e x\y) 

The strong CL is then a quantity that can be evaluated for a completely arbitrary band, 
just as the regular CL. Note that it is always sCL < CL, in accordance with the greater 
strength of the concept. 



IV. PROPERTIES OF STRONG CONFIDENCE REGIONS 

The meaning of strong confidence can be summarized as follows: take a subsample of 
possible experimental results, however defined. While it is still not guaranteed that the 
probability for them to be correct is at least CL as with Bayesian methods, what we gained 
over Neyman's CL is that, independently of the a-priori distribution of \i, the number of 
wrong result is a small fraction of the maximum expected number of results of that kind. 
That is, there may be distributions for \i that lead to all results in that category to be false, 
but in that case those results will present themselves much more rarely than when they lead 
to correct conclusion, and this holds for all possible results in the same way. This is basically 
how far we can go within the classical framework in terms of getting "individually certified" 
result s0. 



4 We could have proved the existence beforehand, by observing that the union of any number of 
invariant rules is still an invariant rule, therefore the largest invariant rule is immediately identified 
as the union of all possible invariant rules. 

5 Note that, even if one assumes that a distribution of the parameter exists, a probability statement 
about each result is impossible to obtain in the classical framework without knowing the distribution 
(a priori), unless one chooses the trivial solution of the band covering the whole space. 
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The possibility of empty confidence regions is ruled out here in full generality, unlike the 
case of LR ordering: if there is a set x °f values for which the confidence region is empty, 
then obviously there exist a \x for which the ratio on left side of (||) is arbitrarily close to 1. 
Unless CL=0, that means the total probability of \ is identically zero for all /x. 

It is also easy to see that strong bands are stable for small perturbations of the pdf like 
those previously discussed in the examples. This is due to the fact that the requirements 
being made are based on integrals of the pdf rather than its punctual values. The integrals 
on all subsets that are not too small stay the same after the addition of perturbations of 
small total probability. The effect is only seen on small scales, and it just forces the addition 
of the small region where the perturbation is large to the unperturbed band. 

A. Independence from change of variable 

The above defined strong bands have another interesting property: they are invariant 
under any change of variable in x-space. This is obvious since the probabilities appearing in 
the ratio in (|B|) scale proportionally under any change of variable. We stressed before that 
the strength of the classical approach lies in its independence from metric in /i-space, that 
is, in its equanimity with respect to every value of /i, in contrast with the Bayesian approach 
where all values are explicitly weighted for relative a-priori importance. 

It should be clear as well that the use of a particular metric of x space in constructing 
a CR is a way to introduce a-priori discriminations between values of the x, that is, to 
introduce arbitrary (irrelevant) information in the choice of the confidence band, so it is not 
surprising that this invariance is a consequence of our approach. 

Amongst all common rules for selecting CR, LR-ordering is the only one to be indepen- 
dent from transformations of x. The partial success of the LR ordering principle might in the 
end be traced back to its compliance with this requirement of independence from metric in 
x space. In fact, the LR ordering rule is equivalent to the narrowest band in that particular 
metric in parameter space that makes the maximum likelihood value constant for all /i. 

We have seen, however, that while this property is desirable, and probably necessary for 
physically sensible results, it is not sufficient to ensure them. 

B. Construction of Strong Confidence Regions 

A simple and useful corollary of (|(J) is: 

COROLLARY - If the observable is discrete^], than for every value of x, any strong band 
always includes all values of p, such that p{x\p) > p{x\(i) ■ (1 — sCL). 
PROOF: just put x = {%} in (§). 



It does not hold in full generality for a continuous variable, since it is always possible to choose 
an arbitrary band for single isolated x without affecting any of the formulas above, that always 
refer to events or set of events with of non-zero total probability. However, it is intuitively expected 
that it holds for continuous variables too, provided some regularity condition is asked 
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This immediately shows another appealing feature of strong CL: it is forbidden to exclude 
any value of /x having a likelihood "close" to the max-likelihood value. Again, this is not a 
property of any other method (LR-ordering just tends to give such values somewhat high 
ranks, and fi gets the highest rank whenever it exists, but there is no guarantee on their 
actual inclusion in the band. Other rules do even less than that). 

Note that, just as in Neyman's CL, in a generic case there may be many different "legal" 
bands for a given pdf and sCL, therefore the question of the choice between them reappears. 
However, since there is now no fear of unreasonable results, the only reason for pinpointing 
a general and unique choice is just to prevent the possible distorted practice of choosing the 
band after the experiment, and the question is largely a matter of convenience. In order to 
be coherent with the spirit of the current approach, however, the choice must be formulated 
in such a way to be invariant under any transformation of x. 

For instance, a good choice might be to minimize the coverage for every value of \i 
independently. This makes for the lowest possible CL for the given sCL. Conversely, the 
bands chosen in this way will have the highest sCL for a given CL. They can be considered 
with good reason the "best band" for a given CL, in case an experimenter wishes to fix 
the desired value of CL as usual, rather than the sCL. Obviously, if the maximum sCL 
corresponding to a given CL is small or even zero, that means no physically sensible band 
is possible without increasing the CL (that is, "overcovering" of all values of \x is necessary) . 

In practice the freedom of choice is often very limited, since the "core region" identified 
by the above corollary must be completely included by any legal strong band at the given 
sCL. That core region is defined only by the the pdf for the local values of x, therefore is 
not affected by changes in the pdf for other values. 

The actual determination of the bands in other than the simplest cases requires numerical 
calculations. We now describe a simple algorithm to construct in practice a band satisfying 
the criteria. 

In order to do numerical calculations, the pdf must be discretized if the parameter or 
the observable are continuous. This is achieved by sampling the parameter space with an 
N-dimensional grid, and splitting the space X of the observable into a finite number of 
regions. Those regions are considered as possible discrete outcomes, and their probabilities 
are obtained by integrating the density p(x\fi) over each of the regions. In this way, a 
rectangular matrix is obtained, independently on the dimensionality of the x and \i spaces, 
which may be both arbitrary-length vectors of numbers. This matrix is used as input in the 
following simple algorithm. 

All intervals of x are initially assigned to the rejected region, that is, the band is initialized 
to be empty. For each value of /j, one loops over all possible sets composed of any number of 
the chosen x regions. The condition is checked on all sets in turn, and if found invalid, 
one of the regions in the current set is added to the confidence band, and removed from any 
further checks. The set of accepted regions obtained upon completion of this procedure for 
all values of /i is a strong band. The freedom in the choice of the region to be added to the 
band is what allows different solutions to be generated. 

It is not obvious how to achieve the minimal coverage requirement suggested above within 
this stepwise procedure. There are however simple and reasonable recipes for performing 
the choice step of the algorithm. One can, for instance, systematically choose the low- 
est/highest x to get the analogue of lower/upper limits in the standard approach, or choose 
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the region with the highest value of the ratio tested by condition (||). The latter appears 
particularly natural and has the interesting characteristics of representing an extension of 
the LR-ordering rule to the sCL context, even if the result might be slightly dependent on 
the order in which the sets of regions are being checked by the algorithm. 



C. Sample Applications 



The definition of strong CL gives satisfying answers to all problems listed in Sec. |IIB . 

In some cases the solution follows immediately from the corollary above. 

One of them is the "indifferent" pdf, where the conclusion that no value of the parameter 
can be excluded, whatever the required sCL is immediately found, and it is stable for small 
perturbations of the pdf. 

For the uniform distribution, the full range of /i for which L(fi) > gets included, 
whatever the chosen sCL. This strong statement reflects the intuitive arbitrariness of any 
choice wishing to exclude some value of a parameter in favor of others having exactly the 
same likelihood. In fact, when a problem with uniform pdf is encountered, most physicists 
don't even formulate a question of Confidence Limits, but just quote the absolute extrema 
of the allowed interval for fi. 

For the Poisson with background, it is easy to see that the result for the case of zero 
observed events will be independent of background. The probability of zero events is e^'e -6 , 
so by changing the expected background b one changes the likelihood by a simple multiplica- 
tive constant. From the definition of local scale invariance one has immediately that the 
limits for this case cannot depend on b. This statement needs a bit of clarification: we have 
remarked that the strong band is not uniquely identified in a general case, therefore one 
can make various choices. What is guaranteed here is that all possible choices for the limits 
from zero counts for a given value of b are also acceptable choices for any other value of b. 
This does not imply that one must necessarily make the same choice in the two cases. 

We have calculated the confidence limits in the special case of 6=3.0 using the simple 
method outlined in the previous section, and compared the results with other classical 
methods in Table |. The upper, lower and the LR-ordering analogue choices mentioned 
above are shown. The intervals obtained are wider than with any other method. 

This should not be considered a loss of power , but rather regarded as a reflection of 
the higher standards of quality required to the result. The parts of the band that would 
be excluded by other methods are here included just on the same basis that yields the 
correct conclusion for the zero-count case, and prevents crazy conclusions from indifferent 
distribution: their likelihood is not low enough with respect to the maximum value. These 
considerations suggest that one should not consider this widening of the band a loss of power 
unless one also considers a loss of power the inability to draw conclusions on the mass of 
neutrinos by throwing dice. 



V. SUMMARY 

The current methods for determining classical Confidence Limits produce counter- 
intuitive results in a variety of situations. This includes the recent proposals based on 
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Likelihood Ratio ordering, that is not immune from the problem of empty confidence re- 
gions. 

By imposing the requirement that only the information contained in the shape of the 
Likelihood function be used in determining the limits, a stronger definition of classical limits 
is derived, which is a natural extension of the original Neyman's condition. 

This "strong confidence limits" turns out to be immune to the problem of empty accepted 
regions, and stable for small perturbations of the probability distribution, at the price of 
some widening of the usual limits. 
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TABLES 



TABLE I. Comparison of 90% confidence intervals for Poisson with background level of 3.0, 



for LR ordering, modified LR Q, and strong CL ( "pseudo-LR" , low, and high band) 
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